feat: improve data handling in util.py by grouping and sorting measurements#406
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes data handling in the MeasuringPointEstimator class by pre-grouping and sorting measurements during initialization instead of performing these operations repeatedly during estimation. This change improves performance by reducing redundant filtering and sorting operations.
Changes:
- Pre-grouped water level measurements by PointID during initialization
- Replaced repeated filtering and sorting operations with direct group lookup
- Added null checks for cases where PointID groups don't exist
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…handling of missing points
| self._grouped = df.groupby("PointID", sort=False).apply( | ||
| lambda g: g.sort_values("DateMeasured") | ||
| ) | ||
| self.verbose = False | ||
|
|
There was a problem hiding this comment.
The use of apply() with groupby() is inefficient and will be deprecated in pandas 2.x. Use df.sort_values('DateMeasured').groupby('PointID', sort=False) instead to achieve the same result with better performance. Sorting before grouping maintains the sort order within each group.
| self._grouped = df.groupby("PointID", sort=False).apply( | |
| lambda g: g.sort_values("DateMeasured") | |
| ) | |
| self.verbose = False | |
| df = df.sort_values("DateMeasured") | |
| self._grouped = df.groupby("PointID", sort=False) | |
| self.verbose = False |
Why
This PR addresses the following problem / context:
How
Implementation summary - the following was changed / added / removed:
Notes
Any special considerations, workarounds, or follow-up work to note?